import osimport pandas as pdimport numpy as npimport importlibfrom pathlib import Pathimport src.util as utilimport src.rv as rvimport src.lstm as lstmimport src.garch as garchimport src.pipeline2 as p2_ = importlib.reload(util)_ = importlib.reload(rv)_ = importlib.reload(garch)_ = importlib.reload(lstm)_ = importlib.reload(p2)BUILD_MODEL =FalseRUN_EVALUATION =Falseos.makedirs('temp/insample', exist_ok=True)os.makedirs('temp/outsample', exist_ok=True)os.makedirs('temp/pipeline2', exist_ok=True)os.makedirs('models/lstm', exist_ok=True)
Executive Summary
Background
Problem Context and Motivations
Market makers profit off the bid-ask spread, the discrepancy between the highest price a buyer is willing to pay and the lowest price a seller is willing to accept (O’Hara, 1995). Volatility, which measures price fluctuations in financial markets, introduces both risk and opportunity to market makers (Hasbrouck, 2006). Low volatility indicates stable price movements, tighter bid–ask spreads, and profits made through high trade volumes. In contrast, during high volatility, price fluctuations heighten—creating uncertainty and risk, thus spreads widen to insure against potential losses. Bollerslev and Melvin (1994) stated that there is a strong positive relationship between volatility and spreads; increased volatility evidently widens bid–ask spreads, with highly statistically significant coefficients linking conditional variance to spread levels.
For trading firms like Optiver, accurately forecasting short-term volatility is crucial for setting competitive quotes and managing execution risk, especially in high-frequency trading (HFT) and options markets (Optiver, 2021). Motivated by this, our study focuses on leveraging predicted short-term volatility to optimise quoting strategies based on bid-ask spread. This study also considers the effects of inter-stock correlation on model performance by training a model on one stock and testing it on both a highly correlated and uncorrelated stock. The aim being to see if information about one stock can be used to improve and or make predictions about another.
This context leads to the main research question of this study:
How can short-term volatility forecasts be leveraged to optimise bid-ask spread quoting strategies—balancing execution risk and market competitiveness—for HFT firms? Additionally, how does inter-stock correlation influence prediction accuracy and quoting effectiveness when models are applied to unseen stocks?
Objectives
This study aims to:
Develop short-term volatility forecasting models using high-frequency order book data, focusing on LSTM-based approaches.
Assess the effectiveness of integrating these forecasts into quoting strategies, with the goal of improving bid-ask spread predictions and supporting market-making decisions.
Investigate the generalisability of single-stock models to unseen stocks with varying correlation levels.
Prior Work and Relevance
Nelson et al. (2017) demonstrated that Long Short-Term Memory (LSTM) networks can be effectively applied to financial time series forecasting, achieving an average accuracy of 55.9% in predicting short-term stock price movements. Recent work by Wang et al. (2019) introduced the attention-enhanced AT-LSTM model, which significantly outperformed both traditional ARIMA and standard LSTM models in forecasting financial time series. The attention mechanism dynamically assigns weights to different time steps, helping the model focus on the most relevant historical information for improved prediction accuracy. Their results showed that AT-LSTM achieved the lowest Mean Absolute Percentage Error (MAPE) values across multiple indices (including the Russell 2000, DJIA, and Nasdaq), consistently outperforming ARIMA.
These studies highlight the suitability of LSTM-based approaches for high-frequency volatility forecasting in our context.
Dataset
Code
DATA_FOLDER ="data"FEATURE_FILE ="order_book_feature.parquet"TARGET_FILE ="order_book_target.parquet"# Primary stock ID for model trainingMODEL_STOCK_ID =50200# Number of time_ids to use for trainingMODEL_TIMEID_COUNT =50# Other stocks for cross-stock performance comparisonCROSS_STOCK_IDS = [22753, 104919]# Number of time_ids per stock for comparisonCROSS_TIMEID_COUNT =10feature_path = os.path.join(DATA_FOLDER, FEATURE_FILE)target_path = os.path.join(DATA_FOLDER, TARGET_FILE)df_features = pd.read_parquet(feature_path, engine="pyarrow")df_target = pd.read_parquet(target_path, engine="pyarrow")# Concatenate feature and target, then sortdf_all = ( pd.concat([df_features, df_target], axis=0) .sort_values(by=["stock_id", "time_id", "seconds_in_bucket"]) .reset_index(drop=True))# Prepare main-stock training datasetdf_main_raw = df_all[df_all["stock_id"] == MODEL_STOCK_ID].copy()main_time_ids = df_main_raw["time_id"].unique()[:MODEL_TIMEID_COUNT]# df_main_train: training feature set for the primary stock (50 time_ids)df_main_train = ( df_main_raw[df_main_raw["time_id"].isin(main_time_ids)] .pipe(util.create_snapshot_features) .reset_index(drop=True))unique_time_ids = df_main_raw["time_id"].unique()test_time_ids = unique_time_ids[MODEL_TIMEID_COUNT : MODEL_TIMEID_COUNT +10]# df_main_test: test feature set for the primary stock (next 10 time_ids)df_main_test = ( df_main_raw[df_main_raw["time_id"].isin(test_time_ids)] .pipe(util.create_snapshot_features) .reset_index(drop=True))# Prepare cross-stock comparison datasetsdf_cross_features = {}for stock_id in CROSS_STOCK_IDS: df_stock_raw = df_all[df_all["stock_id"] == stock_id].copy() time_ids_cross = df_stock_raw["time_id"].unique()[:CROSS_TIMEID_COUNT] df_stock_feat = ( df_stock_raw[df_stock_raw["time_id"].isin(time_ids_cross)] .pipe(util.create_snapshot_features) .reset_index(drop=True) )# df_cross_features: dict of feature sets for each comparison stock (10 time_ids) df_cross_features[stock_id] = df_stock_feat
This study uses the Optiver Additional Dataset, which contains sequential ultra-high-frequency limit order book (LOB) snapshots for multiple stocks, structured into hourly trading windows. Specifically, order_book_feature.parquet includes 17.6 million rows from the first 30 minutes of each trading hour, and order_book_target.parquet includes 17.9 million rows from the last 30 minutes. Each row is indexed by stock_id, time_id, and seconds_in_bucket (0–3599), together defining a specific stock-hour snapshot.
The feature and target datasets were concatenated and sorted by stock_id, time_id, and seconds_in_bucket to reconstruct complete 1-hour trading periods. For modelling, we focus on a single primary stock (stock_id = 50200) for training and testing, and two additional stocks (stock_id = 22753 and 104919) for cross-stock generalisation analysis.
Methodology
The overall methodology consists of two main pipelines: the first focuses on forecasting short-term volatility, and the second uses these forecasts to inform quoting strategies. Figure 1 below provides a schematic overview of the workflow, including rolling data preparation, model building and evaluation, and final deployment of quoting strategies.
Code
from IPython.display import ImageImage(filename='resources/figure1.png')
Figure 1: Schematic workflow overview.
Feature Engineering
Feature engineering was applied to the reconstructed dataset to generate meaningful variables capturing market dynamics. For the volatility forecasting pipeline, engineered features include:
Mid price: average of bid and ask prices
Bid-ask spread: difference between the lowest ask price and the highest bid price
Weighted average price
Spread percentage
Order book imbalance
Depth ratio
Log return and log WAP change
Rolling standard deviation of log returns
Spread z-score
Volume imbalance
For the volatility-informed quoting strategy pipeline, the key input feature is the predicted short-term volatility (predicted_volatility_lead1) from the final LSTM model, combined with order book-based features including:
Weighted average price
Standardised spread percentage
Imbalance
Depth ratio
Log return
Average bid-ask spread
Detailed feature definitions and formulas are provided in the Appendix A: Feature Definitions.
Rolling Data Preparation
To capture short-term volatility dynamics, we adopt a rolling window approach in our data preparation. This method involves segmenting the high-frequency order book data into overlapping windows, enabling us to create multiple training samples and better capture evolving market microstructure patterns.
We experimented with different rolling window configurations, defined by three key parameters: window size (W), which determines the length of the historical data used for prediction; forecast horizon (H), which specifies how far into the future the prediction is made; and step size (S), which controls how much the window slides forward to create the next observation. To evaluate these configurations, we used a baseline Random Forest model and measured performance based on both prediction accuracy (MSE) and stability across different time periods.
Our experiments revealed a U-shaped relationship between window size and prediction error, indicating that excessively large or small windows degrade prediction performance. Finally, we found that a configuration of W=330s, H=10s, and S=5s provided the best trade-off between prediction accuracy and sample richness. We selected this configuration for all subsequent model development, ensuring consistent and reliable short-term volatility forecasts.
The first pipeline focuses on generating accurate short-term volatility predictions, a crucial capability for HFT firms to optimise options pricing and quoting strategies. To this end, we developed a robust model to capture the complex volatility behaviour of the selected stock.
Our models were trained using sequential order book data enhanced with engineered features. Due to computational constraints, we initially selected 50 consecutive time_ids from stock 50200 as the training set for testing—using an 80/20 chronological train-test split to maintain temporal consistency.
We trialled three initial candidates for volatility prediction: HAV-RV (with weighted least squares as a linear baseline), GARCH, and Long Short-Term Memory (LSTM) networks. The optimal GARCH hyperparameters (p=1, q=2) were determined by grid search, minimising RMSE. LSTM networks, though less interpretable due to their gated architecture, demonstrated superior capability in capturing long-term dependencies and nonlinear volatility patterns (Srivatsavaya, 2023). We iteratively tuned the LSTM model’s architecture and hyperparameters to achieve the best trade-off between complexity and performance.
The final volatility forecasting model is a double-layer bidirectional LSTM architecture integrating a Mixture-of-Experts (MoE) approach. It consists of two stacked bidirectional LSTM layers (128 hidden units each) and a dense fusion layer (64 ReLU units). The MoE head features two expert outputs—one capturing normal market behaviour and another focused on spikes—combined via a learned spike probability gating mechanism. To effectively balance precision and stability during training, we employed a two-stage loss strategy: an initial weighted MSE loss focusing on volatility magnitude, followed by a combined log-cosh and focal BCE loss to refine spike detection and robustness. This dual-headed design allows the model to dynamically adapt to different volatility regimes, enhancing both accuracy and interpretability.
To prevent data leakage, we ensured that the 80/20 train-test split strictly adhered to non-overlapping time_ids. Additional safeguards included separate data generators for training and validation, feature normalisation within each rolling window, and clipping outlier values to stabilise predictions.
Model selection was based on RMSE performance, with further discussion and detailed evaluation metrics provided in the following sections. This thorough approach ensures that the volatility forecasts feeding into our quoting strategy are accurate, robust, and well-suited to real-world trading environments.
This quoting strategy model uses the predicted volatility from the Volatility Forecasting model and current order book signals to generate quoting strategies that adapt to market conditions. This enhances interpretability and helps market makers adjust bid-ask spreads for high-frequency trading (HFT).
Key features include the predicted short-term volatility (predicted_volatility_lead1) and engineered order book features: spread_pct_scaled, wap, imbalance, depth_ratio, log_return, and bid_ask_spread. A rolling window approach with a 330-bucket segment and a 10-bucket stride was used, shifting the bid-ask spread target forward by one step to avoid data leakage.
An XGBoost model was implemented to estimate the next-period bid-ask spread. XGBoost (Extreme Gradient Boosting) is a form of decision tree-based machine learning, advantageous for its high accuracy, scalability, and built-in regularisation to prevent overfitting. While it is not traditionally suited for time series data, the incorporation of rolling windows and lagged features overcomes this limitation (XGBoosting, 2023). Z-score normalisation (StandardScaler) was applied to standardise numerical features with high variance. A chronological 80/20 train-test split was used to preserve order of the time series, sustaining consistent learning as well as a 5-fold cross validation. The model’s hyperparameter selection was administered using a grid search, in which 768 candidates were considered, resulting in parameters which provided the optimal configuration.
To assess performance, the model was compared against a naive baseline (previous spread as prediction), using metrics including MSE, MAE, RMSE, R², absolute error (AE), squared error (SE), and percentage error (PE). The mid-price, calculated as the average of the best bid and ask, was used as a simple estimator for the next-period mid-price.
The final quoting prices were generated using: \[\text{bid} = \text{mid-price} - \frac{\text{spread}}{2}\]\[\text{ask} = \text{mid-price} + \frac{\text{spread}}{2}\]
Evaluation Metrics
To assess the performance of our two main models—Volatility Forecasting and Volatility-Informed Quoting Strategy Model—we used complementary evaluation metrics tailored to each stage’s goals.
Volatility Forecasting Model (Pipeline 1)
The primary objective of the volatility forecasting model is to provide accurate short-term volatility predictions that serve as key inputs for the quoting model.
Metrics used include:
RMSE (Root Mean Squared Error): Measures the average magnitude of prediction errors. It is particularly important as smaller errors directly contribute to more accurate quoting model inputs. \[
\text{RMSE} = \sqrt{ \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2 }
\]
QLIKE (Quasi-Likelihood Loss): Focuses on the accuracy of volatility forecasts relative to actual variance, which is important for financial volatility modeling. \[
\text{QLIKE} = \frac{1}{N} \sum_{i=1}^{N} \left( \frac{y_i^2}{\hat{y}_i^2} - \log \left( \frac{y_i^2}{\hat{y}_i^2} \right) - 1 \right)
\]
MSE (Mean Squared Error): Provides a standard measure of average squared prediction errors. \[
\text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2
\]
Inference Time: Measures the computational efficiency for each prediction for high-frequency environments.
Among these, RMSE is considered the most critical metric because the quoting model depends on accurate volatility forecasts. Lower RMSE in Pipeline 1 leads to more precise bid-ask spread predictions in P2, directly impacting quoting effectiveness.
Volatility-Informed Quoting Strategy Model (Pipeline 2)
To evaluate the quoting strategy model’s performance, we employed four microstructure-based metrics:
Hit Ratio: Measures how often our quotes are competitive enough to be executed. \[
\begin{aligned}
\text{Hit Ratio} &=
\frac{ \text{Number of competitive quotes} }{ \text{Total number of quotes} } \\\\
& \text{where bid} \geq \text{market bid and ask} \leq \text{market ask}
\end{aligned}
\]
Inside-Spread Quote Ratio: Assesses whether quotes are placed inside the market spread for better execution. \[
\begin{aligned}
\text{Inside-Spread Quote Ratio} &=
\frac{ \text{Number of quotes inside market spread} }{ \text{Total number of quotes} } \\\\
& \text{where bid} > \text{market bid and ask} < \text{market ask}
\end{aligned}
\]
Average Quote Effectiveness: Evaluates the average improvement of our quoted prices over the market reference. \[
\begin{aligned}
\text{Effectiveness} &=
\frac{1}{N} \sum_{i=1}^{N} \frac{1}{2} \bigl( (\text{Quoted Bid}_i - \text{Market Bid}_i) + (\text{Market Ask}_i - \text{Quoted Ask}_i) \bigr) \\\\
& \text{where } N \text{ is the total number of quotes}
\end{aligned}
\]
Sharpe Ratio of Quote Effectiveness: Measures the consistency and risk-adjusted performance of our quote placements. \[
\text{Sharpe Ratio} = \frac{\mathbb{E}[\text{Quote Effectiveness}]}{\text{Std}[\text{Quote Effectiveness}]}
\]
These metrics provide a comprehensive evaluation of how effectively the quoting model balances execution competitiveness, market efficiency, and consistency under varying market conditions.
Figure 2. Model Performance Comparison: Boxplot comparison of RMSE values across models, showing that the bidirectional LSTM model achieves a balance of low mean RMSE and small variance, indicating superior predictive accuracy and stability.
Based on the RMSE robustness comparison across models, the bidirectional LSTM model was selected as the final volatility forecasting model. While other models may show slightly lower mean RMSE or narrower variance, the bidirectional LSTM demonstrates an optimal balance between these two aspects—achieving both consistently low prediction errors and limited variance across different trading periods. This balance is crucial for real-world deployment, as it ensures that the volatility forecasts remain accurate without significant fluctuations under varying market conditions. Such stability and reliability are essential for providing consistent and actionable insights to support quoting strategies in high-frequency trading environments.
Table 1: Average evaluation metrics for the volatility prediction models across the test data. The LSTM model outperforms WLS and GARCH in both RMSE and QLIKE, demonstrating more accurate and consistent volatility forecasts.
Our final LSTM model outperforms the baseline models across all key metrics. Specifically, it achieves an average RMSE that is ~20% lower than the WLS baseline and ~36% lower than the GARCH baseline. Similarly, the LSTM’s MSE is ~52% lower than WLS and ~68% lower than GARCH, while also exhibiting the lowest QLIKE value, indicating more accurate and consistent volatility predictions. Although the LSTM model requires longer inference time (0.058s compared to 0.00003s for WLS), this trade-off is justified by its superior predictive performance, supporting its selection as the final volatility forecasting model for our quoting pipeline.
Figure 3: RMSE robustness of the final LSTM model when tested on the in-sample stock (50200), a high-correlation stock (104919), and a low-correlation stock (22753).
We tested the final LSTM volatility prediction model on three stock settings: in-sample (50200), a highly correlated stock (104919), and a low-correlation stock (22753). The correlation was determined by comparing mean log returns of each stock relative to the training stock (50200).
Our results show that while the in-sample RMSE is the lowest—indicating the best fit—the model surprisingly performs worse on the high-correlation stock than the low-correlation stock. Specifically, the high-correlation stock has a higher RMSE and narrower variance than the low-correlation stock. This suggests that despite strong historical co-movement, volatility dynamics for highly correlated stocks may still differ significantly, limiting the predictive power of a single-stock model.
Figure 4: Quote effectiveness over time for the quoting strategy model. This time series captures the average price improvement of our quotes compared to market reference prices.
With a hit ratio of 45.98%, the model effectively balances aggressiveness and passivity, managing to get filled roughly half the time. This reflects a well-calibrated trade-off between execution probability and pricing precision.
The average quote effectiveness is near zero, and the Sharpe ratio (-0.0992) is close to flat, suggesting no exploitable inefficiencies in quoting placement. The model is aligned with market pricing but lacks predictive edge.
The line plot of quote effectiveness over time above reveals a stable, stationary process fluctuating around zero. There is no discernible drift or trend in performance, implying that the quoting logic maintains a neutral stance across various market conditions. This consistency supports the idea that the model does not degrade over time and can be reliably deployed without frequent recalibration.
Discussion
Interpretation of Results
Volatility Forecasting: Effective but Conservative
Our LSTM model outperformed traditional models like HAV-RV and GARCH, achieving 20% reduction on RMSE compared to the WLS baseline model, indicating much more precise short-term volatility predictions. This findings aligns with Zhang et al. (2022), who demonstrated that neural networks outperform linear regression and tree-based models in forecasting intra-day realized volatility due to their ability to capture complex latent interactions among variables. However, the LSTM still underreacts to sharp volatility spikes—reflecting its tendency to smooth predictions, a known limitation of standard LSTM models. This smoothing effect reduced responsiveness during high-risk windows, which could be critical in HFT settings. The introduction of the spike gate and fusion layer improved sensitivity, but future work could explore attention-based or transformer variants to better capture sudden changes.
Quoting Strategy: Realistic but Limited Predictive Edge
The quote placement model built on XGBoost incorporated predicted volatility and order book features to forecast bid-ask spreads. While the hit ratio (~50%) suggests that the model can generate executable quotes about half the time, other metrics like quote effectiveness and Sharpe ratio hovered near zero, indicating a lack of systematic pricing advantage over the naive baseline. This outcome likely reflects the stable, low-volatility nature of the dataset and the use of a static mid-price estimator. Without more variability in the target or dynamic mid-price forecasting, the model’s predictive edge remains constrained.
Our tests on a high-correlation stock and a low-correlation stock revealed that correlation alone does not guarantee predictive transferability. Surprisingly, the model performed worse on the highly correlated stock, challenging the assumption that strong co-movement implies good model generalisation. This suggests that differences in order book microstructure and liquidity can outweigh price-level similarities.
Practical Relevance and Application
This study provides market makers and quantitative traders with short-term volatility forecasts to optimise bid-ask quoting strategies, balancing execution risk and market competitiveness in high-frequency trading environments. The final LSTM-based model’s predictions are implemented in an interactive dashboard, which visualises predicted vs actual volatility, evaluates cross-stock performance, and demonstrates how forecasts can be used to guide quoting decisions. Additional screenshots and functionality details are included in the Appendix B: Shiny App. The models and insights developed here are intended for market makers, traders, and quantitative strategists seeking to refine quoting strategies and manage risk in dynamic markets.
Limitations and Future Work
A key limitation of our volatility forecasting pipeline is the LSTM model’s tendency to smooth out sharp volatility spikes, causing delayed responses to sudden market changes. Our attempts to use an attention-based LSTM architecture actually resulted in even smoother predictions, suggesting that our feature engineering or loss functions may not have been well aligned with capturing such rapid spikes. Additionally, although our cross-stock analysis was conducted, the forecasting model itself did not incorporate explicit cross-stock features. This limits its generalisability to other assets despite strong historical correlations. Computational constraints and limited time also restricted us from training more complex, multi-stock models and expanding the temporal coverage of our dataset.
The quoting strategy model similarly assumes a static mid-price for the next interval, without explicitly modeling its dynamic evolution, and does not account for practical constraints like inventory management or risk appetite. These oversights can impact real-world quoting decisions, where such factors are critical.
Future work could address these limitations by exploring attention-based or time-aware architectures combined with refined feature engineering to better capture sudden volatility shifts. Incorporating explicit cross-stock features and training generalised multi-stock models would likely improve predictive stability across diverse assets. Finally, testing adaptive quoting algorithms beyond XGBoost and developing a dedicated mid-price prediction model could make quoting strategies more dynamic and responsive to market changes.
Conclusion
This study demonstrates that LSTM-based models significantly outperform traditional approaches in short-term volatility forecasting, reducing RMSE by 20% and 36% compared to WLS and GARCH, respectively. Incorporating these forecasts into an XGBoost-based quoting strategy improved bid-ask spread predictions, confirming the practical relevance of our two-pipeline approach for high-frequency trading environments. However, cross-stock generalisation remains a challenge, as historical correlation alone does not guarantee predictive transferability—highlighting the need for future models to explicitly incorporate shared microstructure features.
- Performed feature engineering - Built and tuned LSTM model for volatility forecasting - Integrated WLS, HAR-RV, GARCH models into the workflow - Developed evaluation pipeline - Designed presentation slides - Consolidated and debugged group code into unified, reproducible pipeline in final report
Chenghao Kang
540234745
- Literature review - Tested and improved XGBoost model for Pipeline 2 - Evaluated Pipeline 1 model and identified the most suitable for Pipeline 2 - Contributed to construction of Pipeline 2 - Final presentation - Report: Pipeline 2 section and supplemented other sections
Oscar Pham
530417214
- Literature review for Pipeline 2 - Tested LSTM and ARMA-GARCH for Pipeline 1 - Trained and tested models for Pipeline 2 - Developed naive quoting strategy - Researched evaluation metrics/techniques for quoting strategy - Final report: Pipeline 2 methods/evaluation/limitations, Discussion/Limitations
Jiayi Li
530109516
- Tested ARMA-GARCH and ARIMA models for Pipeline 1 - Literature review - Interactive dashboard (reformed using Shiny, Pipeline 2 tab) - Final presentation - Final report: interpretation and implications, summary of key findings, significance
Ella Jones
520434145
- Literature review - Initial XGBoost modelling test - Evaluation of Pipeline 1 through inter-stock correlation - Created Figure 1 Schematic Workflow (Presentation and Report) - Final Presentation: contribution to slides and script - Final Report: Method Pipeline 1 and thorough editing
References
Appendices
Appendix A: Feature Definitions
Intermediate Variables
Feature
Definition
Formula
Mid price
Average of best bid and best ask prices
\(\frac{\text{Bid Price} + \text{Ask Price}}{2}\)
Bid-ask spread
Difference between the lowest ask price and the highest bid price
Average raw spread over the current 330s rolling window
\(\text{Ask Price} - \text{Bid Price}\), averaged over the rolling window
Appendix B: Shiny App
Homepage
Volatility Forecasting
Quoting Strategies
Source Code
---title: "Precision Volatility Forecasting for Strategic Quote Placement in High-Frequency Trading"subtitle: "DATA3888 Data Science Capstone Project"author: "Optiver Stream, Group 22"date: "`r Sys.Date()`"format: html: code-tools: true code-fold: true fig_caption: yes embed-resources: true theme: flatly css: - https://use.fontawesome.com/releases/v5.0.6/css/all.css toc: true toc_depth: 4 toc_float: true margin-width: 350pxexecute: cache: true cache-path: _cache cache-depth: 0reference-location: margincitation-location: marginjupyter: python3---```{python}import osimport pandas as pdimport numpy as npimport importlibfrom pathlib import Pathimport src.util as utilimport src.rv as rvimport src.lstm as lstmimport src.garch as garchimport src.pipeline2 as p2_ = importlib.reload(util)_ = importlib.reload(rv)_ = importlib.reload(garch)_ = importlib.reload(lstm)_ = importlib.reload(p2)BUILD_MODEL =FalseRUN_EVALUATION =Falseos.makedirs('temp/insample', exist_ok=True)os.makedirs('temp/outsample', exist_ok=True)os.makedirs('temp/pipeline2', exist_ok=True)os.makedirs('models/lstm', exist_ok=True)```# Executive Summary# Background## Problem Context and MotivationsMarket makers profit off the bid-ask spread, the discrepancy between the highest price a buyer is willing to pay and the lowest price a seller is willing to accept (O’Hara, 1995).Volatility, which measures price fluctuations in financial markets, introduces both risk and opportunity to market makers (Hasbrouck, 2006).Low volatility indicates stable price movements, tighter bid–ask spreads, and profits made through high trade volumes.In contrast, during high volatility, price fluctuations heighten—creating uncertainty and risk, thus spreads widen to insure against potential losses.Bollerslev and Melvin (1994) stated that there is a strong positive relationship between volatility and spreads; increased volatility evidently widens bid–ask spreads, with highly statistically significant coefficients linking conditional variance to spread levels.For trading firms like Optiver, accurately forecasting short-term volatility is crucial for setting competitive quotes and managing execution risk, especially in high-frequency trading (HFT) and options markets (Optiver, 2021).Motivated by this, our study focuses on leveraging predicted short-term volatility to optimise quoting strategies based on bid-ask spread.This study also considers the effects of inter-stock correlation on model performance by training a model on one stock and testing it on both a highly correlated and uncorrelated stock.The aim being to see if information about one stock can be used to improve and or make predictions about another.This context leads to the main research question of this study:> **How can short-term volatility forecasts be leveraged to optimise bid-ask spread quoting strategies—balancing execution risk and market competitiveness—for HFT firms? Additionally, how does inter-stock correlation influence prediction accuracy and quoting effectiveness when models are applied to unseen stocks?**## ObjectivesThis study aims to:1. Develop short-term volatility forecasting models using high-frequency order book data, focusing on LSTM-based approaches.2. Assess the effectiveness of integrating these forecasts into quoting strategies, with the goal of improving bid-ask spread predictions and supporting market-making decisions.3. Investigate the generalisability of single-stock models to unseen stocks with varying correlation levels.## Prior Work and RelevanceNelson et al. (2017) demonstrated that Long Short-Term Memory (LSTM) networks can be effectively applied to financial time series forecasting, achieving an average accuracy of 55.9% in predicting short-term stock price movements.Recent work by Wang et al. (2019) introduced the attention-enhanced AT-LSTM model, which significantly outperformed both traditional ARIMA and standard LSTM models in forecasting financial time series.The attention mechanism dynamically assigns weights to different time steps, helping the model focus on the most relevant historical information for improved prediction accuracy.Their results showed that AT-LSTM achieved the lowest Mean Absolute Percentage Error (MAPE) values across multiple indices (including the Russell 2000, DJIA, and Nasdaq), consistently outperforming ARIMA.These studies highlight the suitability of LSTM-based approaches for high-frequency volatility forecasting in our context.## Dataset```{python}DATA_FOLDER ="data"FEATURE_FILE ="order_book_feature.parquet"TARGET_FILE ="order_book_target.parquet"# Primary stock ID for model trainingMODEL_STOCK_ID =50200# Number of time_ids to use for trainingMODEL_TIMEID_COUNT =50# Other stocks for cross-stock performance comparisonCROSS_STOCK_IDS = [22753, 104919]# Number of time_ids per stock for comparisonCROSS_TIMEID_COUNT =10feature_path = os.path.join(DATA_FOLDER, FEATURE_FILE)target_path = os.path.join(DATA_FOLDER, TARGET_FILE)df_features = pd.read_parquet(feature_path, engine="pyarrow")df_target = pd.read_parquet(target_path, engine="pyarrow")# Concatenate feature and target, then sortdf_all = ( pd.concat([df_features, df_target], axis=0) .sort_values(by=["stock_id", "time_id", "seconds_in_bucket"]) .reset_index(drop=True))# Prepare main-stock training datasetdf_main_raw = df_all[df_all["stock_id"] == MODEL_STOCK_ID].copy()main_time_ids = df_main_raw["time_id"].unique()[:MODEL_TIMEID_COUNT]# df_main_train: training feature set for the primary stock (50 time_ids)df_main_train = ( df_main_raw[df_main_raw["time_id"].isin(main_time_ids)] .pipe(util.create_snapshot_features) .reset_index(drop=True))unique_time_ids = df_main_raw["time_id"].unique()test_time_ids = unique_time_ids[MODEL_TIMEID_COUNT : MODEL_TIMEID_COUNT +10]# df_main_test: test feature set for the primary stock (next 10 time_ids)df_main_test = ( df_main_raw[df_main_raw["time_id"].isin(test_time_ids)] .pipe(util.create_snapshot_features) .reset_index(drop=True))# Prepare cross-stock comparison datasetsdf_cross_features = {}for stock_id in CROSS_STOCK_IDS: df_stock_raw = df_all[df_all["stock_id"] == stock_id].copy() time_ids_cross = df_stock_raw["time_id"].unique()[:CROSS_TIMEID_COUNT] df_stock_feat = ( df_stock_raw[df_stock_raw["time_id"].isin(time_ids_cross)] .pipe(util.create_snapshot_features) .reset_index(drop=True) )# df_cross_features: dict of feature sets for each comparison stock (10 time_ids) df_cross_features[stock_id] = df_stock_feat```This study uses the Optiver Additional Dataset, which contains sequential ultra-high-frequency limit order book (LOB) snapshots for multiple stocks, structured into hourly trading windows.Specifically, `order_book_feature.parquet` includes 17.6 million rows from the first 30 minutes of each trading hour, and `order_book_target.parquet` includes 17.9 million rows from the last 30 minutes.Each row is indexed by `stock_id`, `time_id`, and `seconds_in_bucket` (0–3599), together defining a specific stock-hour snapshot.The feature and target datasets were concatenated and sorted by `stock_id`, `time_id`, and `seconds_in_bucket` to reconstruct complete 1-hour trading periods.For modelling, we focus on a single primary stock (`stock_id` = 50200) for training and testing, and two additional stocks (`stock_id` = 22753 and 104919) for cross-stock generalisation analysis.# MethodologyThe overall methodology consists of two main pipelines: the first focuses on forecasting short-term volatility, and the second uses these forecasts to inform quoting strategies.**Figure 1** below provides a schematic overview of the workflow, including rolling data preparation, model building and evaluation, and final deployment of quoting strategies.```{python}#| fig-cap: "Figure 1: Schematic workflow overview."from IPython.display import ImageImage(filename='resources/figure1.png')```## Feature EngineeringFeature engineering was applied to the reconstructed dataset to generate meaningful variables capturing market dynamics.For the volatility forecasting pipeline, engineered features include:- Mid price: average of bid and ask prices- Bid-ask spread: difference between the lowest ask price and the highest bid price- Weighted average price- Spread percentage- Order book imbalance- Depth ratio- Log return and log WAP change- Rolling standard deviation of log returns- Spread z-score- Volume imbalanceFor the volatility-informed quoting strategy pipeline, the key input feature is the predicted short-term volatility (`predicted_volatility_lead1`) from the final LSTM model, combined with order book-based features including:- Weighted average price- Standardised spread percentage- Imbalance- Depth ratio- Log return- Average bid-ask spreadDetailed feature definitions and formulas are provided in the Appendix A: Feature Definitions.## Rolling Data PreparationTo capture short-term volatility dynamics, we adopt a rolling window approach in our data preparation.This method involves segmenting the high-frequency order book data into overlapping windows, enabling us to create multiple training samples and better capture evolving market microstructure patterns.We experimented with different rolling window configurations, defined by three key parameters: window size (W), which determines the length of the historical data used for prediction; forecast horizon (H), which specifies how far into the future the prediction is made; and step size (S), which controls how much the window slides forward to create the next observation.To evaluate these configurations, we used a baseline Random Forest model and measured performance based on both prediction accuracy (MSE) and stability across different time periods.Our experiments revealed a U-shaped relationship between window size and prediction error, indicating that excessively large or small windows degrade prediction performance.Finally, we found that a configuration of `W=330s`, `H=10s`, and `S=5s` provided the best trade-off between prediction accuracy and sample richness.We selected this configuration for all subsequent model development, ensuring consistent and reliable short-term volatility forecasts.## Volatility Forecasting Models```{python}feature_cols = ["wap", "spread_pct", "imbalance", "depth_ratio", "log_return","log_wap_change", "rolling_std_logret", "spread_zscore", "volume_imbalance"]if BUILD_MODEL: _, wls_val_df = rv.wls(df_main_train) wls_val_df.to_csv('temp/insample/wls_val_df.csv') garch_val_df = garch.garch(df_main_train) garch_val_df.to_csv('temp/insample/garch_val_df.csv') _, baseline_val_df = lstm.baseline(df_main_train, epochs=50) baseline_val_df.to_csv('temp/insample/baseline_val_df.csv') _, moe_val_df = lstm.moe(df_main_train, feature_cols, epochs=50) moe_val_df.to_csv('temp/insample/moe_val_df.csv') _, _, moe_staged_val_df = lstm.moe_staged(df_main_train, feature_cols, epochs=50) moe_staged_val_df.to_csv('temp/insample/moe_staged_val_df.csv')```The first pipeline focuses on generating accurate short-term volatility predictions, a crucial capability for HFT firms to optimise options pricing and quoting strategies.To this end, we developed a robust model to capture the complex volatility behaviour of the selected stock.Our models were trained using sequential order book data enhanced with engineered features.Due to computational constraints, we initially selected 50 consecutive `time_ids` from stock 50200 as the training set for testing—using an 80/20 chronological train-test split to maintain temporal consistency.We trialled three initial candidates for volatility prediction: HAV-RV (with weighted least squares as a linear baseline), GARCH, and Long Short-Term Memory (LSTM) networks.The optimal GARCH hyperparameters (`p=1`, `q=2`) were determined by grid search, minimising RMSE.LSTM networks, though less interpretable due to their gated architecture, demonstrated superior capability in capturing long-term dependencies and nonlinear volatility patterns (Srivatsavaya, 2023).We iteratively tuned the LSTM model’s architecture and hyperparameters to achieve the best trade-off between complexity and performance.The final volatility forecasting model is a double-layer bidirectional LSTM architecture integrating a Mixture-of-Experts (MoE) approach.It consists of two stacked bidirectional LSTM layers (128 hidden units each) and a dense fusion layer (64 ReLU units).The MoE head features two expert outputs—one capturing normal market behaviour and another focused on spikes—combined via a learned spike probability gating mechanism.To effectively balance precision and stability during training, we employed a two-stage loss strategy: an initial weighted MSE loss focusing on volatility magnitude, followed by a combined log-cosh and focal BCE loss to refine spike detection and robustness.This dual-headed design allows the model to dynamically adapt to different volatility regimes, enhancing both accuracy and interpretability.To prevent data leakage, we ensured that the 80/20 train-test split strictly adhered to non-overlapping `time_ids`.Additional safeguards included separate data generators for training and validation, feature normalisation within each rolling window, and clipping outlier values to stabilise predictions.Model selection was based on RMSE performance, with further discussion and detailed evaluation metrics provided in the following sections.This thorough approach ensures that the volatility forecasts feeding into our quoting strategy are accurate, robust, and well-suited to real-world trading environments.## Volatility-Informed Quoting Strategy Model```{python}# prepare lstm prediction from pipeline 1cache_dir = Path("temp/pipeline2")cache_dir.mkdir(parents=True, exist_ok=True)cache_file = cache_dir /"predictions_spy.csv"if cache_file.is_file(): pred_df = pd.read_csv(cache_file)else: basic_features = ["wap", "spread_pct", "imbalance", "depth_ratio","log_return", "log_wap_change", "rolling_std_logret","spread_zscore", "volume_imbalance" ] val_df = util.out_of_sample_evaluation( model_path, scaler_path, df_main_train, basic_features ) pred_df = val_df.rename(columns={"y_pred": "predicted_volatility_lead1"}) pred_df.to_csv(cache_file, index=False)best_model, eval_metrics = p2.train_bid_ask_spread_model( df_main_train, pred_df, cache_dir="models/pipeline2", model_save_path="models/pipeline2/bid_ask_spread_model.pkl")result = p2.generate_quote( pred_df, df_main_train, spread_model_path="models/pipeline2/bid_ask_spread_model.pkl", stock_id=50200)```This quoting strategy model uses the predicted volatility from the Volatility Forecasting model and current order book signals to generate quoting strategies that adapt to market conditions.This enhances interpretability and helps market makers adjust bid-ask spreads for high-frequency trading (HFT).Key features include the predicted short-term volatility (`predicted_volatility_lead1`) and engineered order book features: `spread_pct_scaled`, `wap`, `imbalance`, `depth_ratio`, `log_return`, and `bid_ask_spread`.A rolling window approach with a 330-bucket segment and a 10-bucket stride was used, shifting the bid-ask spread target forward by one step to avoid data leakage.An XGBoost model was implemented to estimate the next-period bid-ask spread.XGBoost (Extreme Gradient Boosting) is a form of decision tree-based machine learning, advantageous for its high accuracy, scalability, and built-in regularisation to prevent overfitting.While it is not traditionally suited for time series data, the incorporation of rolling windows and lagged features overcomes this limitation (XGBoosting, 2023).Z-score normalisation (StandardScaler) was applied to standardise numerical features with high variance.A chronological 80/20 train-test split was used to preserve order of the time series, sustaining consistent learning as well as a 5-fold cross validation.The model’s hyperparameter selection was administered using a grid search, in which 768 candidates were considered, resulting in parameters which provided the optimal configuration.To assess performance, the model was compared against a naive baseline (previous spread as prediction), using metrics including MSE, MAE, RMSE, R², absolute error (AE), squared error (SE), and percentage error (PE).The mid-price, calculated as the average of the best bid and ask, was used as a simple estimator for the next-period mid-price.The final quoting prices were generated using:$$\text{bid} = \text{mid-price} - \frac{\text{spread}}{2}$$$$\text{ask} = \text{mid-price} + \frac{\text{spread}}{2}$$## Evaluation MetricsTo assess the performance of our two main models—Volatility Forecasting and Volatility-Informed Quoting Strategy Model—we used complementary evaluation metrics tailored to each stage’s goals.### Volatility Forecasting Model (Pipeline 1)The primary objective of the volatility forecasting model is to provide accurate short-term volatility predictions that serve as key inputs for the quoting model.Metrics used include:- **RMSE (Root Mean Squared Error)**: Measures the average magnitude of prediction errors. It is particularly important as smaller errors directly contribute to more accurate quoting model inputs.$$\text{RMSE} = \sqrt{ \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2 }$$- **QLIKE (Quasi-Likelihood Loss)**: Focuses on the accuracy of volatility forecasts relative to actual variance, which is important for financial volatility modeling.$$\text{QLIKE} = \frac{1}{N} \sum_{i=1}^{N} \left( \frac{y_i^2}{\hat{y}_i^2} - \log \left( \frac{y_i^2}{\hat{y}_i^2} \right) - 1 \right)$$- **MSE (Mean Squared Error)**: Provides a standard measure of average squared prediction errors.$$\text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2$$- **Inference Time**: Measures the computational efficiency for each prediction for high-frequency environments.Among these, RMSE is considered the most critical metric because the quoting model depends on accurate volatility forecasts. Lower RMSE in Pipeline 1 leads to more precise bid-ask spread predictions in P2, directly impacting quoting effectiveness.### Volatility-Informed Quoting Strategy Model (Pipeline 2)To evaluate the quoting strategy model’s performance, we employed four microstructure-based metrics:- **Hit Ratio**: Measures how often our quotes are competitive enough to be executed.$$\begin{aligned}\text{Hit Ratio} &=\frac{ \text{Number of competitive quotes} }{ \text{Total number of quotes} } \\\\& \text{where bid} \geq \text{market bid and ask} \leq \text{market ask}\end{aligned}$$- **Inside-Spread Quote Ratio**: Assesses whether quotes are placed inside the market spread for better execution.$$\begin{aligned}\text{Inside-Spread Quote Ratio} &=\frac{ \text{Number of quotes inside market spread} }{ \text{Total number of quotes} } \\\\& \text{where bid} > \text{market bid and ask} < \text{market ask}\end{aligned}$$- **Average Quote Effectiveness**: Evaluates the average improvement of our quoted prices over the market reference.$$\begin{aligned}\text{Effectiveness} &=\frac{1}{N} \sum_{i=1}^{N} \frac{1}{2} \bigl( (\text{Quoted Bid}_i - \text{Market Bid}_i) + (\text{Market Ask}_i - \text{Quoted Ask}_i) \bigr) \\\\& \text{where } N \text{ is the total number of quotes}\end{aligned}$$- **Sharpe Ratio of Quote Effectiveness**: Measures the consistency and risk-adjusted performance of our quote placements.$$\text{Sharpe Ratio} = \frac{\mathbb{E}[\text{Quote Effectiveness}]}{\text{Std}[\text{Quote Effectiveness}]}$$These metrics provide a comprehensive evaluation of how effectively the quoting model balances execution competitiveness, market efficiency, and consistency under varying market conditions.# Result## Model Performance Comparison```{python}#| fig-cap: "Figure 2. Model Performance Comparison: Boxplot comparison of RMSE values across models, showing that the bidirectional LSTM model achieves a balance of low mean RMSE and small variance, indicating superior predictive accuracy and stability."wls_val_df = pd.read_csv('temp/insample/wls_val_df.csv')garch_val_df = pd.read_csv('temp/insample/garch_val_df.csv')baseline_val_df = pd.read_csv('temp/insample/baseline_val_df.csv')moe_val_df = pd.read_csv('temp/insample/moe_val_df.csv')bilstm_val_df = pd.read_csv('temp/insample/moe_staged_val_df.csv')val_dfs = {'wls_baseline': wls_val_df,'garch_baseline': garch_val_df,'lstm_baseline': baseline_val_df,'moe_lstm': moe_val_df,'bidirectional_lstm': bilstm_val_df}util.plot_rmse_robustness(val_dfs)```Based on the RMSE robustness comparison across models, the bidirectional LSTM model was selected as the final volatility forecasting model.While other models may show slightly lower mean RMSE or narrower variance, the bidirectional LSTM demonstrates an optimal balance between these two aspects—achieving both consistently low prediction errors and limited variance across different trading periods.This balance is crucial for real-world deployment, as it ensures that the volatility forecasts remain accurate without significant fluctuations under varying market conditions.Such stability and reliability are essential for providing consistent and actionable insights to support quoting strategies in high-frequency trading environments.```{python}#| fig-cap: "Table 1: Average evaluation metrics for the volatility prediction models across the test data. The LSTM model outperforms WLS and GARCH in both RMSE and QLIKE, demonstrating more accurate and consistent volatility forecasts."val_dfs_metrics = {'WLS': wls_val_df,'GARCH': garch_val_df,'LSTM': bilstm_val_df}out = util.create_evaluation_metrics_table(val_dfs_metrics)display(out)```Our final LSTM model outperforms the baseline models across all key metrics.Specifically, it achieves an average RMSE that is ~20% lower than the WLS baseline and ~36% lower than the GARCH baseline.Similarly, the LSTM’s MSE is ~52% lower than WLS and ~68% lower than GARCH, while also exhibiting the lowest QLIKE value, indicating more accurate and consistent volatility predictions.Although the LSTM model requires longer inference time (0.058s compared to 0.00003s for WLS), this trade-off is justified by its superior predictive performance, supporting its selection as the final volatility forecasting model for our quoting pipeline.## Cross-Stock Generalisation Analysis```{python}#| fig-cap: "Figure 3: RMSE robustness of the final LSTM model when tested on the in-sample stock (50200), a high-correlation stock (104919), and a low-correlation stock (22753)."model_path ="models/lstm/moe_staged.h5"scaler_path ="models/lstm/moe_staged_scalers.pkl"feature_cols = ["wap", "spread_pct", "imbalance", "depth_ratio","log_return", "log_wap_change","rolling_std_logret", "spread_zscore", "volume_imbalance"]val_dfs_cross = {}cache_dir ='temp/outsample'for stock_id, df_feat in df_cross_features.items(): cache_file =f'{cache_dir}/{stock_id}.csv'if RUN_EVALUATION ornot os.path.isfile(cache_file): val_df = util.out_of_sample_evaluation(model_path, scaler_path, df_feat, feature_cols) val_df.to_csv(cache_file, index=False)else: val_df = pd.read_csv(cache_file) val_dfs_cross[stock_id] = val_dfin_sample_df = pd.read_csv('temp/insample/moe_staged_val_df.csv')val_dfs_for_plot = {"In Sample": in_sample_df,"High Correlation Stock": val_dfs_cross[104919],"Low Correlation Stock": val_dfs_cross[22753],}util.plot_rmse_robustness(val_dfs_for_plot)```We tested the final LSTM volatility prediction model on three stock settings: in-sample (50200), a highly correlated stock (104919), and a low-correlation stock (22753).The correlation was determined by comparing mean log returns of each stock relative to the training stock (50200).Our results show that while the in-sample RMSE is the lowest—indicating the best fit—the model surprisingly performs worse on the high-correlation stock than the low-correlation stock.Specifically, the high-correlation stock has a higher RMSE and narrower variance than the low-correlation stock.This suggests that despite strong historical co-movement, volatility dynamics for highly correlated stocks may still differ significantly, limiting the predictive power of a single-stock model.## Quoting Strategies (bid-ask spread prediction) Effectiveness```{python}#| fig-cap: "Figure 4: Quote effectiveness over time for the quoting strategy model. This time series captures the average price improvement of our quotes compared to market reference prices."cache_dir = Path("temp/pipeline2")cache_dir.mkdir(parents=True, exist_ok=True)cache_file_test = cache_dir /"predictions_spy_test.csv"if cache_file_test.is_file(): val_df_test = pd.read_csv(cache_file_test)else: basic_features = ["wap", "spread_pct", "imbalance", "depth_ratio","log_return", "log_wap_change", "rolling_std_logret","spread_zscore", "volume_imbalance" ] val_df_test = util.out_of_sample_evaluation( model_path, scaler_path, df_main_test, basic_features ) val_df_test = val_df_test.rename(columns={"y_pred": "predicted_volatility_lead1"}) val_df_test.to_csv(cache_file_test, index=False)p2_metrics = p2.evaluate_quote_strategy( val_df_test, df_main_test, spread_model_path="models/pipeline2/bid_ask_spread_model.pkl")```With a hit ratio of 45.98%, the model effectively balances aggressiveness and passivity, managing to get filled roughly half the time.This reflects a well-calibrated trade-off between execution probability and pricing precision.The average quote effectiveness is near zero, and the Sharpe ratio (-0.0992) is close to flat, suggesting no exploitable inefficiencies in quoting placement.The model is aligned with market pricing but lacks predictive edge.The line plot of quote effectiveness over time above reveals a stable, stationary process fluctuating around zero.There is no discernible drift or trend in performance, implying that the quoting logic maintains a neutral stance across various market conditions.This consistency supports the idea that the model does not degrade over time and can be reliably deployed without frequent recalibration.# Discussion## Interpretation of Results### Volatility Forecasting: Effective but ConservativeOur LSTM model outperformed traditional models like HAV-RV and GARCH, achieving 20% reduction on RMSE compared to the WLS baseline model, indicating much more precise short-term volatility predictions.This findings aligns with Zhang et al. (2022), who demonstrated that neural networks outperform linear regression and tree-based models in forecasting intra-day realized volatility due to their ability to capture complex latent interactions among variables.However, the LSTM still underreacts to sharp volatility spikes—reflecting its tendency to smooth predictions, a known limitation of standard LSTM models. This smoothing effect reduced responsiveness during high-risk windows, which could be critical in HFT settings.The introduction of the spike gate and fusion layer improved sensitivity, but future work could explore attention-based or transformer variants to better capture sudden changes.### Quoting Strategy: Realistic but Limited Predictive EdgeThe quote placement model built on XGBoost incorporated predicted volatility and order book features to forecast bid-ask spreads.While the hit ratio (~50%) suggests that the model can generate executable quotes about half the time, other metrics like quote effectiveness and Sharpe ratio hovered near zero, indicating a lack of systematic pricing advantage over the naive baseline. This outcome likely reflects the stable, low-volatility nature of the dataset and the use of a static mid-price estimator. Without more variability in the target or dynamic mid-price forecasting, the model’s predictive edge remains constrained.### Cross-Stock Generalisation: Correlation isn't TransferabilityOur tests on a high-correlation stock and a low-correlation stock revealed that correlation alone does not guarantee predictive transferability.Surprisingly, the model performed worse on the highly correlated stock, challenging the assumption that strong co-movement implies good model generalisation.This suggests that differences in order book microstructure and liquidity can outweigh price-level similarities.## Practical Relevance and ApplicationThis study provides market makers and quantitative traders with short-term volatility forecasts to optimise bid-ask quoting strategies, balancing execution risk and market competitiveness in high-frequency trading environments.The final LSTM-based model’s predictions are implemented in an interactive dashboard, which visualises predicted vs actual volatility, evaluates cross-stock performance, and demonstrates how forecasts can be used to guide quoting decisions.Additional screenshots and functionality details are included in the Appendix B: Shiny App.The models and insights developed here are intended for market makers, traders, and quantitative strategists seeking to refine quoting strategies and manage risk in dynamic markets.## Limitations and Future WorkA key limitation of our volatility forecasting pipeline is the LSTM model’s tendency to smooth out sharp volatility spikes, causing delayed responses to sudden market changes.Our attempts to use an attention-based LSTM architecture actually resulted in even smoother predictions, suggesting that our feature engineering or loss functions may not have been well aligned with capturing such rapid spikes.Additionally, although our cross-stock analysis was conducted, the forecasting model itself did not incorporate explicit cross-stock features.This limits its generalisability to other assets despite strong historical correlations.Computational constraints and limited time also restricted us from training more complex, multi-stock models and expanding the temporal coverage of our dataset.The quoting strategy model similarly assumes a static mid-price for the next interval, without explicitly modeling its dynamic evolution, and does not account for practical constraints like inventory management or risk appetite.These oversights can impact real-world quoting decisions, where such factors are critical.Future work could address these limitations by exploring attention-based or time-aware architectures combined with refined feature engineering to better capture sudden volatility shifts.Incorporating explicit cross-stock features and training generalised multi-stock models would likely improve predictive stability across diverse assets.Finally, testing adaptive quoting algorithms beyond XGBoost and developing a dedicated mid-price prediction model could make quoting strategies more dynamic and responsive to market changes.# ConclusionThis study demonstrates that LSTM-based models significantly outperform traditional approaches in short-term volatility forecasting, reducing RMSE by 20% and 36% compared to WLS and GARCH, respectively.Incorporating these forecasts into an XGBoost-based quoting strategy improved bid-ask spread predictions, confirming the practical relevance of our two-pipeline approach for high-frequency trading environments.However, cross-stock generalisation remains a challenge, as historical correlation alone does not guarantee predictive transferability—highlighting the need for future models to explicitly incorporate shared microstructure features.# Student Contributions| Name | Student ID | Contributions ||-----------------|-------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|| Daisy Lim | 520204962 | - Research, literature review <br> - Initial baseline model: HAV-RV OLS & WLS <br> - Interactive dashboard (About page, Volatility Prediction tab, integrated Quoting Strategies tab) <br> - Requirements/installation script <br> - Final presentation & contributed to slides <br> - Report: Executive Summary, Background, Discussion (limitations & improvements), Conclusion (future work) <br> - Contributed to README || Junrui Kang | 530531740 | - Performed feature engineering <br> - Built and tuned LSTM model for volatility forecasting <br> - Integrated WLS, HAR-RV, GARCH models into the workflow <br> - Developed evaluation pipeline <br> - Designed presentation slides <br> - Consolidated and debugged group code into unified, reproducible pipeline in final report || Chenghao Kang | 540234745 | - Literature review <br> - Tested and improved XGBoost model for Pipeline 2 <br> - Evaluated Pipeline 1 model and identified the most suitable for Pipeline 2 <br> - Contributed to construction of Pipeline 2 <br> - Final presentation <br> - Report: Pipeline 2 section and supplemented other sections || Oscar Pham | 530417214 | - Literature review for Pipeline 2 <br> - Tested LSTM and ARMA-GARCH for Pipeline 1 <br> - Trained and tested models for Pipeline 2 <br> - Developed naive quoting strategy <br> - Researched evaluation metrics/techniques for quoting strategy <br> - Final report: Pipeline 2 methods/evaluation/limitations, Discussion/Limitations || Jiayi Li | 530109516 | - Tested ARMA-GARCH and ARIMA models for Pipeline 1 <br> - Literature review <br> - Interactive dashboard (reformed using Shiny, Pipeline 2 tab) <br> - Final presentation <br> - Final report: interpretation and implications, summary of key findings, significance || Ella Jones | 520434145 | - Literature review <br> - Initial XGBoost modelling test <br> - Evaluation of Pipeline 1 through inter-stock correlation <br> - Created Figure 1 Schematic Workflow (Presentation and Report) <br> - Final Presentation: contribution to slides and script <br> - Final Report: Method Pipeline 1 and thorough editing |# References# Appendices## Appendix A: Feature Definitions### Intermediate Variables| Feature | Definition | Formula ||-----------------|------------------------------------------------------------------|-------------------------------------------------------|| Mid price | Average of best bid and best ask prices | $\frac{\text{Bid Price} + \text{Ask Price}}{2}$ || Bid-ask spread | Difference between the lowest ask price and the highest bid price | $\text{Ask Price} - \text{Bid Price}$ |### Pipeline 1: Volatility Forecasting| Feature | Definition | Formula ||------------------------------------------|--------------------------------------------------|-----------------------------------------------------------------------------------------------------------|| Weighted average price (WAP) | Weighted average price of bid and ask | $\frac{(\text{Bid Price} \times \text{Ask Size}) + (\text{Ask Price} \times \text{Bid Size})}{\text{Bid Size} + \text{Ask Size}}$ || Spread percentage (spread\_pct) | Spread as a percentage of the mid price | $\frac{\text{Ask Price} - \text{Bid Price}}{\text{Mid Price}}$ || Order book imbalance (imbalance) | Snapshot-based imbalance between bid and ask | $\frac{\text{Bid Size} - \text{Ask Size}}{\text{Bid Size} + \text{Ask Size}}$ || Depth ratio | Market depth ratio of bid to ask size | $\frac{\text{Bid Size}}{\text{Ask Size}}$ || Log return | Log return of WAP between snapshots | $\log\left(\frac{\text{WAP}_t}{\text{WAP}_{t-1}}\right)$ || Log WAP change (log\_wap\_change) | Difference in log WAP values | $\log(\text{WAP}_t) - \log(\text{WAP}_{t-1})$ || Rolling standard deviation of log returns | Short-term volatility of log returns | $\text{std}(\log \text{ return }_{t-k} \dots \log \text{ return }_t)$ || Spread z-score (spread\_zscore) | Z-score of spread percentage within a rolling window | $\frac{\text{Spread}_t - \mu_{\text{Spread}}}{\sigma_{\text{Spread}}}$ || Volume imbalance | Difference between ask and bid sizes | $\text{Bid Size} - \text{Ask Size}$ |### Pipeline 2: Volatility-Informed Quoting Strategies| Feature | Definition | Formula ||----------------------------------|-------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|| Predicted short-term volatility | Predicted short-term volatility from Pipeline 1 (LSTM output) | From LSTM model; used as a key input || Weighted average price (WAP) | Weighted average price of bid and ask | $\frac{(\text{Bid Price} \times \text{Ask Size}) + (\text{Ask Price} \times \text{Bid Size})}{\text{Bid Size} + \text{Ask Size}}$ || Standardised spread percentage | Z-score scaled spread percentage | $\frac{\text{Ask Price} - \text{Bid Price}}{\text{Mid Price}}$, then Z-score scaled || Order book imbalance (imbalance)| Snapshot-based imbalance between bid and ask | $\frac{\text{Bid Size} - \text{Ask Size}}{\text{Bid Size} + \text{Ask Size}}$ || Depth ratio | Market depth ratio of bid to ask size | $\frac{\text{Bid Size}}{\text{Ask Size}}$ || Log return | Log return of WAP between snapshots | $\log \left( \frac{\text{WAP}_t}{\text{WAP}_{t-1}} \right)$ || Average bid-ask spread | Average raw spread over the current 330s rolling window | $\text{Ask Price} - \text{Bid Price}$, averaged over the rolling window |## Appendix B: Shiny App